Cross-Lingual Syntactic Transfer with Limited Resources
نویسندگان
چکیده
We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. The method makes use of three steps: 1) a method for deriving cross-lingual word clusters, that can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work (Rasooli and Collins, 2015; Zhang and Barzilay, 2015; Ammar et al., 2016), in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets (26 languages) from the Universal Dependencies corpora: 13 datasets (10 languages) have unlabeled attachment accuracies of 80% or higher; the average unlabeled accuracy on the 38 datasets is 74.8%.
منابع مشابه
A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources
We describe a Machine Translation (MT) approach that is specifically designed to enable rapid development of MT for languages with limited amounts of online resources. Our approach assumes the availability of a small number of bi-lingual speakers of the two languages, but these need not be linguistic experts. The bi-lingual speakers create a comparatively small corpus of word aligned phrases an...
متن کاملCross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Existing approaches to automatic VerbNetstyle verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for inducing VerbNets for multiple languages. To the best of our knowledge, this is the first study which demonstrates how the architectures for learning word em...
متن کاملPredicting Linguistic Structure with Incomplete and Cross-Lingual Supervision
Täckström, O. 2013. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision. Acta Universitatis Upsaliensis. Studia Linguistica Upsaliensia 14. xii+215 pp. Uppsala. ISBN 978-91-554-8631-0. Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the ling...
متن کاملCross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources
Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables ...
متن کاملLanguage engineering for syntactic knowledge transfer
In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results. The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level. The syntactic trees of the Romanian texts are generated by considering the syntactic phrases of the English parallel texts automatically r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TACL
دوره 5 شماره
صفحات -
تاریخ انتشار 2017